Work with categorical variables

frequencies
(M)ANOVA
tukey post-hoc
Check the role of categorical variables
Author
Published

December 2, 2023

Frequencies

When you want to check whether, for instance, more male are present in a specific group, you can get a clean contingency table with a chi-square test by following code. Of course, you can work with all the different options as you wish:

library(sjPlot)

tab_xtab(df$VAR1,df$VAR2,
         show.cell.prc = FALSE,
         show.row.prc = TRUE,
         show.col.prc = FALSE,
         show.legend = TRUE,
         show.na = FALSE,
         show.summary=TRUE)
Group Gender Total
Male Female
Group 1 28
50 %
28
50 %
56
100 %
Group 2 22
45.8 %
26
54.2 %
48
100 %
Total 50
48.1 %
54
51.9 %
104
100 %
χ2=0.052 · df=1 · φ=0.042 · p=0.820

observed values
% within Group

(M)ANOVA

The manovaR function of the CaviR package allows to have a full and informative overview of the role of a categorical variable in prediction of several numeric variables.

The function provides:

The function provides:

  • descriptive statistics for each level of the categorical variable

  • univariate analyses with a p-value for statistical significance and a partial eta-squared for practical significance

  • multivariate analysis with the Wilks’ Lambda

  • small when ηp2 > .0099

  • medium when ηp2 > .0588

  • large when ηp2 >.1379

    (Cohen 2013)

library(CaviR)
manovaR(data[,c('Group','Autonomy','Vitality','Persistence')],
        stand=TRUE, sign = 0.05, tukey = TRUE)
[1] "Grouping variable has only 2 levels. Tukey not applicable"

variables

Group 1

Group 2

F-value

p-value

eta-squared

Autonomy

3.37 (±0.92)

2.31 (±1.09)

57.48

<.001

***

0.37

Vitality

3.37 (±0.92)

2.31 (±1.09)

41.78

<.001

***

0.29

Persistence

3.37 (±0.92)

2.31 (±1.09)

28.18

<.001

***

0.22

Wilks Lambda = 0.607,F(3,95) = 20.479 , p = <.001

When number of levels >2

When the categorical predictor has more than two levels, the function adds the solution of a multicomparison tukey post-hoc analyses to the table in letters. These letters are sorted based on the descriptives.

library(CaviR)
manovaR(data[,c('Groups','Autonomy','Vitality','Persistence')],
        stand=TRUE, sign = 0.05, tukey = TRUE)

variables

Group 1

Group 2

Group 3

Group 4

F-value

p-value

eta-squared

Autonomy

4.31 (±1.19) B

4.23 (±0.95) B

2.73 (±1.46) A

2.16 (±1.16) A

20.18

<.001

***

0.38

Vitality

4.23 (±1.02) B

4.27 (±0.87) B

2.69 (±1.40) A

2.83 (±1.40) A

13.72

<.001

***

0.30

Persistence

3.36 (±1.08) BC

3.38 (±0.78) C

2.64 (±1.15) AB

1.96 (±0.93) A

11.67

<.001

***

0.26

Wilks Lambda = 0.534,F(9,226.488) = 7.396 , p = <.001

References

Cohen, Jacob. 2013. Statistical Power Analysis for the Behavioral Sciences. Academic Press.